{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Accessing an AWS S3 bucket & downloading a csv file\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this notebook we will learn how to access an AWS S3 bucket, download a CSV file from that bucket, create a dataframe from the CSV file contents, and finally perform some calculations on the downloaded data. \n",
    "\n",
    "**Note:**  If you are not familiar with manipulating csv files, work with the **simpleCalc.ipynb** jupyter notebook first to become familiar with file manipulation. \n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install -r requirements.txt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Imports\n",
    "We will need to import various packages. They are either built in the notebook image you are running, or have been installed in the previous step."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#========================================================================================\n",
    "# import needed libraries/packages \n",
    "#\n",
    "# Note:  we use boto3 which is a Python SDK for AWS.  It allows you to create,\n",
    "# configure and manage AWS resources from your Python scripts.  \n",
    "#========================================================================================\n",
    "import os\n",
    "import pandas as pd\n",
    "import boto3\n",
    "import botocore\n",
    "\n",
    "from botocore import UNSIGNED\n",
    "from botocore.client import Config"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#========================================================================================\n",
    "# Identify the name of your S3 bucket, and then the file you wish to download.\n",
    "#========================================================================================\n",
    "bucket_name     = 'rhods-pilot'\n",
    "file_name       = 'truckdata.csv'\n",
    "new_file_name   = 'newtruckdata.csv'\n",
    "\n",
    "local_dest_dir  = os.path.join(os.getcwd(), 'datasets/')\n",
    "s3              = boto3.client('s3', config=Config(signature_version=UNSIGNED))\n",
    "s3.download_file(bucket_name, file_name, (local_dest_dir + new_file_name))\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#========================================================================================\n",
    "# Read the CSV file data into a dataframe\n",
    "#========================================================================================\n",
    "df              = pd.read_csv(local_dest_dir + new_file_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#========================================================================================\n",
    "# Print the contents of the dataframe\n",
    "#========================================================================================\n",
    "print(df)\n",
    "print(df.mileage)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#========================================================================================\n",
    "# Perform calculations on imported data\n",
    "#========================================================================================\n",
    "total_mileage   = sum(df.mileage)  #or total_mileage = dset['mileage'].sum()\n",
    "print(total_mileage)\n",
    "\n",
    "total_rows      = len(df.index)\n",
    "print(total_rows)\n",
    "\n",
    "average_mileage = (total_mileage/total_rows)\n",
    "print(average_mileage)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.6"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
